Open vocabulary speech recognition with flat hybrid models

نویسندگان

  • Maximilian Bisani
  • Hermann Ney
چکیده

Today’s speech recognition systems are able to recognize arbitrary sentences over a large but finite vocabulary. However, many important speech recognition tasks feature an open, constantly changing vocabulary. (E.g. broadcast news transcription, translation of political debates, etc. Ideally, a system designed for such open vocabulary tasks would be able to recognize arbitrary, even previously unseen words. To some extent this can be achieved by using sub-lexical language models. We demonstrate that, by using a simple flat hybrid model, we can significantly improve a well-optimized state-ofthe-art speech recognition system over a wide range of out-of-vocabulary rates.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Graphone Models in Automatic Speech Recognition by Stanley

This research explores applications of joint letter-phoneme subwords, known as graphones, in several domains to enable detection and recognition of previously unknown words. For these experiments, graphones models are integrated into the SUMMIT speech recognition framework. First, graphones are applied to automatically generate pronunciations of restaurant names for a speech recognizer. Word re...

متن کامل

Using Graphone Models in Automatic Speech Recognition

This research explores applications of joint letter-phoneme subwords, known as graphones, in several domains to enable detection and recognition of previously unknown words. For these experiments, graphones models are integrated into the SUMMIT speech recognition framework. First, graphones are applied to automatically generate pronunciations of restaurant names for a speech recognizer. Word re...

متن کامل

Hierarchical hybrid language models for open vocabulary continuous speech recognition using WFST

One of the main challenges in automatic speech recognition is recognizing an open, partly unseen vocabulary. To implicitly reduce the out-of-vocabulary (OOV) rate, hybrid vocabularies consisting of full-words and sub-words are used. Nevertheless, when using subwords, OOV rates are not necessarily zero. In this work, we propose the use of separate character level graphones (orthography and phone...

متن کامل

Hybrid Language Models Using Mixed Types of Sub-Lexical Units for Open Vocabulary German LVCSR

German is a highly inflected language with a large number of words derived from the same root. It makes use of a high degree of word compounding leading to high Out-of-vocabulary (OOV) rates, and Language Model (LM) perplexities. For such languages the use of sub-lexical units for Large Vocabulary Continuous Speech Recognition (LVCSR) becomes a natural choice. In this paper, we investigate the ...

متن کامل

Investigation of Maximum Entropy Hybrid Language Models for Open Vocabulary German and Polish LVCSR

For languages like German and Polish, higher numbers of word inflections lead to high out-of-vocabulary (OOV) rates and high language model (LM) perplexities. Thus, one of the main challenges in large vocabulary continuous speech recognition (LVCSR) is recognizing an open vocabulary. In this paper, we investigate the use of mixed type of sub-word units in the same recognition lexicon. Namely, m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005